Synchronous Binarization for Machine Translation
نویسندگان
چکیده
Systems based on synchronous grammars and tree transducers promise to improve the quality of statistical machine translation output, but are often very computationally intensive. The complexity is exponential in the size of individual grammar rules due to arbitrary re-orderings between the two languages, and rules extracted from parallel corpora can be quite large. We devise a linear-time algorithm for factoring syntactic re-orderings by binarizing synchronous rules when possible and show that the resulting rule set significantly improves the speed and accuracy of a state-of-the-art syntax-based machine translation system.
منابع مشابه
Binarization, Synchronous Binarization, and Target-side Binarization
Binarization is essential for achieving polynomial time complexities in parsing and syntax-based machine translation. This paper presents a new binarization scheme, target-side binarization, and compares it with source-side and synchronous binarizations on both stringbased and tree-based systems using synchronous grammars. In particular, we demonstrate the effectiveness of targetside binarizati...
متن کاملBetter Synchronous Binarization for Machine Translation
Binarization of Synchronous Context Free Grammars (SCFG) is essential for achieving polynomial time complexity of decoding for SCFG parsing based machine translation systems. In this paper, we first investigate the excess edge competition issue caused by a leftheavy binary SCFG derived with the method of Zhang et al. (2006). Then we propose a new binarization method to mitigate the problem by e...
متن کاملAsynchronous Binarization for Synchronous Grammars
Binarization of n-ary rules is critical for the efficiency of syntactic machine translation decoding. Because the target side of a rule will generally reorder the source side, it is complex (and sometimes impossible) to find synchronous rule binarizations. However, we show that synchronous binarizations are not necessary in a two-stage decoder. Instead, the grammar can be binarized one way for ...
متن کاملBinarization of Synchronous Context-Free Grammars
Systems based on synchronous grammars and tree transducers promise to improve the quality of statistical machine translation output, but are often very computationally intensive. The complexity is exponential in the size of individual grammar rules due to arbitrary re-orderings between the two languages. We develop a theory of binarization for synchronous context-free grammars and present a lin...
متن کاملAn Alternative to Synchronous Tree Substitution Grammars 3 2 Preliminaries
Synchronous tree substitution grammars (stsg) are a (formal) tree transformation model that is used in the area of syntax-based machine translation. A competitor that is at least as expressive as stsg is proposed and compared to stsg. The competitor is the extended multi bottom-up tree transducer (mbot), which is the bottom-up analogue with the additional feature that states have non-unary rank...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006